58 research outputs found

    Product recognition in store shelves as a sub-graph isomorphism problem

    Full text link
    The arrangement of products in store shelves is carefully planned to maximize sales and keep customers happy. However, verifying compliance of real shelves to the ideal layout is a costly task routinely performed by the store personnel. In this paper, we propose a computer vision pipeline to recognize products on shelves and verify compliance to the planned layout. We deploy local invariant features together with a novel formulation of the product recognition problem as a sub-graph isomorphism between the items appearing in the given image and the ideal layout. This allows for auto-localizing the given image within the aisle or store and improving recognition dramatically.Comment: Slightly extended version of the paper accepted at ICIAP 2017. More information @project_page --> http://vision.disi.unibo.it/index.php?option=com_content&view=article&id=111&catid=7

    Zero-Annotation Object Detection with Web Knowledge Transfer

    Full text link
    Object detection is one of the major problems in computer vision, and has been extensively studied. Most of the existing detection works rely on labor-intensive supervision, such as ground truth bounding boxes of objects or at least image-level annotations. On the contrary, we propose an object detection method that does not require any form of human annotation on target tasks, by exploiting freely available web images. In order to facilitate effective knowledge transfer from web images, we introduce a multi-instance multi-label domain adaption learning framework with two key innovations. First of all, we propose an instance-level adversarial domain adaptation network with attention on foreground objects to transfer the object appearances from web domain to target domain. Second, to preserve the class-specific semantic structure of transferred object features, we propose a simultaneous transfer mechanism to transfer the supervision across domains through pseudo strong label generation. With our end-to-end framework that simultaneously learns a weakly supervised detector and transfers knowledge across domains, we achieved significant improvements over baseline methods on the benchmark datasets.Comment: Accepted in ECCV 201

    Loss Guided Activation for Action Recognition in Still Images

    Full text link
    One significant problem of deep-learning based human action recognition is that it can be easily misled by the presence of irrelevant objects or backgrounds. Existing methods commonly address this problem by employing bounding boxes on the target humans as part of the input, in both training and testing stages. This requirement of bounding boxes as part of the input is needed to enable the methods to ignore irrelevant contexts and extract only human features. However, we consider this solution is inefficient, since the bounding boxes might not be available. Hence, instead of using a person bounding box as an input, we introduce a human-mask loss to automatically guide the activations of the feature maps to the target human who is performing the action, and hence suppress the activations of misleading contexts. We propose a multi-task deep learning method that jointly predicts the human action class and human location heatmap. Extensive experiments demonstrate our approach is more robust compared to the baseline methods under the presence of irrelevant misleading contexts. Our method achieves 94.06\% and 40.65\% (in terms of mAP) on Stanford40 and MPII dataset respectively, which are 3.14\% and 12.6\% relative improvements over the best results reported in the literature, and thus set new state-of-the-art results. Additionally, unlike some existing methods, we eliminate the requirement of using a person bounding box as an input during testing.Comment: Accepted to appear in ACCV 201

    Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection

    Full text link
    Object detection has witnessed significant progress by relying on large, manually annotated datasets. Annotating such datasets is highly time consuming and expensive, which motivates the development of weakly supervised and few-shot object detection methods. However, these methods largely underperform with respect to their strongly supervised counterpart, as weak training signals \emph{often} result in partial or oversized detections. Towards solving this problem we introduce, for the first time, an online annotation module (OAM) that learns to generate a many-shot set of \emph{reliable} annotations from a larger volume of weakly labelled images. Our OAM can be jointly trained with any fully supervised two-stage object detection method, providing additional training annotations on the fly. This results in a fully end-to-end strategy that only requires a low-shot set of fully annotated images. The integration of the OAM with Fast(er) R-CNN improves their performance by 17%17\% mAP, 9%9\% AP50 on PASCAL VOC 2007 and MS-COCO benchmarks, and significantly outperforms competing methods using mixed supervision.Comment: Accepted at ECCV 2020. Camera-ready version and Appendice

    Weakly Supervised Semantic Segmentation Using Constrained Dominant Sets

    Full text link
    The availability of large-scale data sets is an essential pre-requisite for deep learning based semantic segmentation schemes. Since obtaining pixel-level labels is extremely expensive, supervising deep semantic segmentation networks using low-cost weak annotations has been an attractive research problem in recent years. In this work, we explore the potential of Constrained Dominant Sets (CDS) for generating multi-labeled full mask predictions to train a fully convolutional network (FCN) for semantic segmentation. Our experimental results show that using CDS's yields higher-quality mask predictions compared to methods that have been adopted in the literature for the same purpose

    LifeCLEF 2016: Multimedia Life Species Identification Challenges

    Get PDF
    International audienceUsing multimedia identification tools is considered as one of the most promising solutions to help bridge the taxonomic gap and build accurate knowledge of the identity, the geographic distribution and the evolution of living species. Large and structured communities of nature observers (e.g., iSpot, Xeno-canto, Tela Botanica, etc.) as well as big monitoring equipment have actually started to produce outstanding collections of multimedia records. Unfortunately, the performance of the state-of-the-art analysis techniques on such data is still not well understood and is far from reaching real world requirements. The LifeCLEF lab proposes to evaluate these challenges around 3 tasks related to multimedia information retrieval and fine-grained classification problems in 3 domains. Each task is based on large volumes of real-world data and the measured challenges are defined in collaboration with biologists and environmental stakeholders to reflect realistic usage scenarios. For each task, we report the methodology, the data sets as well as the results and the main outcom

    Vehicle Detection Using Alex Net and Faster R-CNN Deep Learning Models: A Comparative Study

    Get PDF
    This paper has been presented at : 5th International Visual Informatics Conference (IVIC 2017)This paper presents a comparative study of two deep learning models used here for vehicle detection. Alex Net and Faster R-CNN are compared with the analysis of an urban video sequence. Several tests were carried to evaluate the quality of detections, failure rates and times employed to complete the detection task. The results allow to obtain important conclusions regarding the architectures and strategies used for implementing such network for the task of video detection, encouraging future research in this topic.S.A. Velastin is grateful to funding received from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 600371, el Ministerio de Economía y Competitividad (COFUND2013-51509) and Banco Santander. The authors wish to thank Dr. Fei Yin for the code for metrics employed for evaluations. Finally, we gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPUs used for this research. The data and code used for this work is available upon request from the authors

    LifeCLEF 2015: Multimedia Life Species Identification Challenges

    Get PDF
    International audienceUsing multimedia identification tools is considered as one of the most promising solutions to help bridging the taxonomic gap and build accurate knowledge of the identity, the geographic distribution and the evolution of living species. Large and structured communities of nature observers (e.g. eBird, Xeno-canto, Tela Botanica, etc.) as well as big monitoring equipments have actually started to produce outstanding collections of multimedia records. Unfortunately, the performance of the state-of-the-art analysis techniques on such data is still not well understood and is far from reaching the real world’s requirements. The LifeCLEF lab proposes to evaluate these challenges around three tasks related to multimedia information retrieval and fine-grained classification problems in three living worlds. Each task is based on large and real-world data and the measured challenges are defined in collaboration with biologists and environmental stakeholders in order to reflect realistic usage scenarios. This paper presents more particularly the 2014 edition of LifeCLEF, i.e. the pilot one. For each of the three tasks, we report the methodology and the datasets as well as the official results and the main outcomes

    Early esophageal adenocarcinoma detection using deep learning methods

    Get PDF
    Purpose This study aims to adapt and evaluate the performance of different state-of-the-art deep learning object detection methods to automatically identify esophageal adenocarcinoma (EAC) regions from high-definition white light endoscopy (HD-WLE) images. Method Several state-of-the-art object detection methods using Convolutional Neural Networks (CNNs) were adapted to automatically detect abnormal regions in the esophagus HD-WLE images, utilizing VGG’16 as the backbone architecture for feature extraction. Those methods are Regional-based Convolutional Neural Network (R-CNN), Fast R-CNN, Faster R-CNN and Single-Shot Multibox Detector (SSD). For the evaluation of the different methods, 100 images from 39 patients that have been manually annotated by five experienced clinicians as ground truth have been tested. Results Experimental results illustrate that the SSD and Faster R-CNN networks show promising results, and the SSD outperforms other methods achieving a sensitivity of 0.96, specificity of 0.92 and F-measure of 0.94. Additionally, the Average Recall Rate of the Faster R-CNN in locating the EAC region accurately is 0.83. Conclusion In this paper, recent deep learning object detection methods are adapted to detect esophageal abnormalities automatically. The evaluation of the methods proved its ability to locate abnormal regions in the esophagus from endoscopic images. The automatic detection is a crucial step that may help early detection and treatment of EAC and also can improve automatic tumor segmentation to monitor its growth and treatment outcome

    Detecting People in Artwork with CNNs

    Get PDF
    CNNs have massively improved performance in object detection in photographs. However research into object detection in artwork remains limited. We show state-of-the-art performance on a challenging dataset, People-Art, which contains people from photos, cartoons and 41 different artwork movements. We achieve this high performance by fine-tuning a CNN for this task, thus also demonstrating that training CNNs on photos results in overfitting for photos: only the first three or four layers transfer from photos to artwork. Although the CNN's performance is the highest yet, it remains less than 60\% AP, suggesting further work is needed for the cross-depiction problem. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-46604-0_57Comment: 14 pages, plus 3 pages of references; 7 figures in ECCV 2016 Workshop
    corecore